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Abstract — The problem of model selection arises in a number 
of contexts, such as compressed sensing, subset selection in linear 
regression, estimation of structures in graphical models, and 
signal denoising. This paper generalizes the notion of incoherence 
in the existing literature on model selection and introduces two 
fundamental measures of coherence — termed as the worst-case 
coherence and the average coherence — among the columns of a 
design matrix. In particular, it utilizes these two measures of 
coherence to provide an in-depth analysis of a simple one-step 
thresholding (OST) algorithm for model selection. One of the key 
insights offered by the ensuing analysis is that OST is feasible 
for model selection as long as the design matrix obeys an easily 
verifiable property. In addition, the paper also characterizes 
the model-selection performance of OST in terms of the worst- 
case coherence, (i, and establishes that OST performs near- 
optimally in the low signal-to-noise ratio regime for N xC design 
matrices with /i « 0(N~ 1 ^ 2 ). Finally, in contrast to some of the 
existing literature on model selection, the analysis in the paper is 
nonasymptotic in nature, it does not require knowledge of the true 
model order, it is applicable to generic (random or deterministic) 
design matrices, and it neither requires submatrices of the design 
matrix to have full rank, nor does it assume a statistical prior 
on the values of the nonzero entries of the data vector. 

I. Introduction 

In information processing problems involving high- 
dimensional data, the "curse of dimensionality" can often be 
broken by exploiting the fact that real-world data tend to live in 
low-dimensional manifolds. This phenomenon is exemplified 
by the important special case in which a data vector a £ C c 
satisfies ||a||o = Si=i l{|ai|>o} < fc <C C and is observed 
according to the linear measurement model / = $a + r). 
Here, $ is an N x C (real- or complex-valued) matrix called 
the measurement or design matrix, while rj £ represents 
noise in the measurement system. In this problem, the fact 
that a is "fc-sparse" allows one to operate in the so-called 
"compressed" setting, k < N -C C, thereby enabling tasks 
that might be deemed prohibitive otherwise. 

Fundamentally, given a measurement vector / = $0; + r] 
in the compressed setting, there are two complementary — but 
nonetheless distinct — questions that one needs to answer: 

[Estimation] Under what conditions can a fc-sparse a be 
reliably and efficiently reconstructed from /? 

[Model Selection] Under what conditions can the locations 
of the nonzero entries of a fc-sparse a be reliably and 
efficiently recovered from /? 

A number of researchers have successfully addressed the 
estimation question over the past few years under the rubric of 
compressed sensing. In many application areas, however, the 



model-selection question is equally — if not more — important 
than the estimation question. In particular, the problem of 
model selection (sometimes also known as variable selection 
or sparsity pattern recovery) arises indirectly in a number 
of contexts, such as subset selection in linear regression [1], 
estimation of structures in graphical models 0, and signal 
denoising |3j. In addition, solving the model-selection problem 
sometimes also enables one to solve the estimation problem. 

In this paper, we study the problem of polynomial-time 
model selection in a compressed setting for the case when the 
true model order k is unknown. Despite being well-motivated 
by applications, this problem has received less attention com- 
pared to its estimation counterpart in the compressed sensing 
literature; the most notable exceptions here being |2], J4]- 
J8). In particular, the results reported in J2], ||4] establish that 
the lasso [9] asymptotically identifies the correct model under 
certain conditions on the design matrix $ and the sparse vector 
a. Later, Wainwright in |5) strengthens the results of 0, (4) 
and makes explicit the dependence of model selection using 
the lasso on the smallest (in magnitude) nonzero entry of a. 
However, apart from the fact that the results reported in (2), 
|@), are asymptotic in nature, the main limitation of these 
works is that explicit verification of the conditions that $ needs 
to satisfy is computationally intractable for fc > yf~N. 

The most general (and nonasymptotic) results for model 
selection using the lasso have been reported in J6). Specifically, 
Candes and Plan establish in [;6| that the lasso correctly 
identifies most models with probability 1 — (^(C" 1 ) under 
certain conditions on the smallest nonzero entry of a provided: 
(i) the spectral norm (the largest singular value) and the worst- 
case coherence (the maximum absolute innerproduct between 
the columns) of $ are not too large, and (ii) the values of 
the nonzero entries of a are statistically independent (and 
statistically symmetric around zero). The main limitation of 
this work is that statistical independence among the nonzero 
entries of a can be difficult to ensure in many applications. 

Finally, as opposed to the approach taken in J2], Pl-||6l, 
the focus in J7], JU is on model selection using a simple 
thresholding algorithm. In particular, it is shown in both Q, 
[8 1 that model selection using thresholding is asymptotically 
optimal in the low signal-to-noise ratio (SNR) regime. How- 
ever, one of the main limitations of these works is that the 
reported results are mainly asymptotic in nature and rely on 
having some knowledge of the true model order. In addition, 
the analysis carried out in Q is for the specific case of an 
independent and identically distributed (i.i.d.) Gaussian design 



matrix, while the analysis carried out in is for the specific 
case of a with i.i.d. Gaussian nonzero entries. 

A. Our Contributions 

We begin by assuming that the design matrix $ has unit 
^2-norm columns and introducing two fundamental measures 
of coherence among the columns {(fi E C N } of <£>: 

« Worst-Case Coherence: u = max \{tpi,<Pj}\, and 



max 



• Average Coherence: v = ^^max {^PiiVj) 

In words, worst-case coherence is a similarity measure be- 
tween the columns of a design matrix and average coherence 
is a measure of the spread of the columns of a design matrix 
within the TV-dimensional unit ball. Our main objective in this 
paper is to make use of these two measures of coherence in 
order to analyze the one-step thresholding (OST) algorithm 
(see Algorithm [TJ for model selection. Algorithmically, this 
makes our approach to model selection somewhat similar to 
the one studied by Fletcher, Rangan, and Goyal [7] and Reeves 
and Gastpar 0. Analytically, however, the results reported in 
this paper are more general in nature than the ones in J7), 0; 
in particular, the asymptotic results of Q, for thresholding 
can be obtained as a special case of Theorem Q] in Section |ll] 

More specifically, Theorem Q] holds for any (random or 
deterministic) design matrix with sufficiently small values of 
the worst-case and average coherence, and the stated result 
in that case is completely nonasymptotic in nature. Equally 
importantly, unlike the case of Q, 0, the threshold value in 
Theorem[T]is completely independent of the model order k and 
relies only on the knowledge of fi,C, and SNR. In addition, 
Theorem[T]can also be combined with the necessary conditions 
for asymptotically consistent model selection reported in (7), 
ifTol to conclude that model selection using the OST is 
asymptotically optimal in the low SNR regime for any design 
matrix that has fx « OiN" 1 / 2 ) and v « 0(N~ X ). 

Finally, in order to compare the results obtained in this paper 
for model selection using the OST with the nonasymptotic 
results reported in J6] Theorem 1.3] for the lasso, Theorem [2] 
rederives Theorem[T|in terms of conditions on the model order 
k and the smallest nonzero entry of a. In particular, it can be 
easily concluded from Theorem [2] and [6, Theorem 1.3] that 
the OST — despite being computationally primitive — performs 
as well as the lasso for model selection in the low SNR 
regime provided the design matrix has /i rj 0(N~ 1 ^ 2 ) and 
v rj 0(N^ 1 ). In addition, unlike the assumptions made in 
J6l, the OST achieves this without requiring that most N x k 
submatrices of $ be well-conditioned and the nonzero entries 
of a be statistically independent. 

II. Main Result 

A. Problem Setup 

Before proceeding with presenting the main result of this 
paper, we need to be precise about our problem formulation. 
To this end, we begin by reconsidering the measurement model 
/ = $a + r] in the compressed setting (k < N <C C) and take 



Algorithm 1 The One-Step Thresholding (OST) Algorithm 

for Model Selection 

Input: An N xC (real- or complex-valued) matrix $, a vector 
/ € C , and a thresholding parameter A > 0. 
Output: Compute y = $ H / and return an estimate of the 
model 5 = {i € {1, . . . ,C} : \y t \ > A}. 



the noise vector r\ to be distributed as CN(0, a 2 1), although the 
results can be readily generalized for other noise distributions. 
We also assume without loss of generality that $ has unit £2- 
norm columns and || ck|| 2 = 1, since any scaling of $ and a 
can be accounted for in the scaling of a. In addition, we do 
not impose any prior distribution on the design matrix $ and 
the nonzero entries of a. Finally, we use the notation supp(a) 
for the set containing the locations of the nonzero entries of a 
and assume, similar to the case of |6|, 0, that supp(a) is a 
uniformly random fc-subset of {1, . . . , C}. In other words, we 
have a uniform prior on the model supp(a). 

B. The Coherence Property and Its Implications 

It is often realized in the literature that successful model 
selection requires the columns of the design matrix to be 
incoherent; see, e.g., J2], J4], (6). Below, we mathematically 
formalize this notion in terms of the coherence property. 

Definition 1 (The Coherence Property). A matrix $ is said to 
obey the coherence property if the following conditions hold: 



(CP-1) /i < 



1 



and 



(CP-2) 



v < 



I2fi 



\/101ogC ' " y<N 

Notice that the coherence property can be easily verified 

in polynomial time since it only requires checking that 

||$ H $-/||max < (lOlogC)- 1 / 2 and ||($ H $ - < 
12(C-l)A^- 1 /2||$H < j,_ / || max . 

Before proceeding with describing the implications of the 
coherence property, it is instructive to first define three funda- 
mental quantities as follows: 
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SNR, 



MAR 



«KI>0' - " E[|MH]/fc' 1/k 

In words, a m i n is the magnitude of the smallest nonzero entry 
of a, SNR m i n is the ratio of the energy in the smallest nonzero 
entry of a and the average noise energy per nonzero entry, and 
MAR — which is termed as the minimum-to-average ratio Q — 
is the ratio of the energy in the smallest nonzero entry of a 
and the average signal energy per nonzero entry. We are now 
ready to state the main result of this paper. 

Theorem 1. Suppose that $ obeys the coherence property 
and write its worst-case coherence as fi — c\N~ 1 /@ for some 
c\ > (which may depend on logC) and (3 € (1 , 00]. Next, 
choose the threshold A = 4 max { Ufi^Tk^C, y/a 2 \ogC }. 
Then the OST satisfies Pr(6> 7^ supp(a)) < 9C _1 as long as 
the number of measurements 

\ 64 /2c 2 ^ /2 ' 

N > max 2fclogC, fclogC, — -klogC 



SNR, 



MAR 



Here, the quantity C2 > is defined as C2 = (96 Ci) 2 . 

Remark 1. The constants in the second and third terms in 
the max expression can be significantly reduced if one is 
only interested in showing the model-selection consistency of 
OST; that is, limc^oo Pr(6> ^ supp(a)) = 0. One should be 
particularly vigilant of this fact while comparing these results 
to the asymptotic ones reported in [7] for thresholding. 

Note that there are two fundamental but complementary 
approaches that can be taken while analyzing an algorithm for 
model selection, namely, the minimum measurement resources 
approach and the permissible signal class approach. The 
statement of Theorem Q] helps us analyze the OST for model 
selection using the former approach and is best suited for 
comparing our results with those in 0, Q, iflOl . On the other 
hand, Candes and Plan in J6) take the latter approach while 
analyzing the lasso for model selection and the following result 
is best suited for comparison purposes in this regard. 

Theorem 2. Suppose that $ obeys the coherence property and 
choose the threshold A = 4 max {l2/iy/2 logC, y c 2 logC }. 
Then, as long as k < 7V/(21ogC) and 

CKmin > max {8vVlogC, 96^ v^ic} 

the OST satisfies Pr(<S + supp(a)) < 9C" 1 . 
C. Discussion 

The statements of Theorem [TJ and Theorem [2] can be best 
put into perspective by considering specific examples of design 
matrices. Because of space constraints, we only consider 
here the case when $ is an (appropriately normalized) i.i.d. 
Gaussian matrix. It is a well-known fact in the literature 
that the worst-case coherence of $ in this case is roughly 
fi w y/2\ogC/N with high probability; see, e.g., |6|. In 
addition, it can also be shown using the Bernstein inequality 
that v < 12N~ 1 y/2 logC with high probability in this case. 
It therefore follows that a Gaussian design matrix obeys the 
coherence property with high probability. 

Theorem Q] therefore implies that the OST identifies the 
correct model in this case with high probability as long 
as N > nmx{ g N '° sC , fcl °£ c |. In particular, this expression 

reduces to N > ^° s ^ for the case of SNR = 1/E[||^|||] < 1. 
On the other hand, we have from J7), IflOl that no scheme 
can asymptotically identify the correct model if N snr ■ 
This proves the near-optimality of the OST for model selec- 
tion in the low SNR regime for any design matrix that has 
fx w 0{N^ 1 / 2 ) and v rj 0(N^ v ). Finally, note that we could 
have made a similar conclusion by focusing on Theorem |2] 
and comparing the conditions in the low SNR regime in that 
case with those in J6] Theorem 1.3]. 

III. Proofs 

The general roadmap for the proofs of Theorem Q] and 
Theorem 12 is as follows. Below, we first introduce the notion 
of (fc, e, S)-statistical orthogonality condition (StOC). Next, 
we establish in Lemma Q] that if $ satisfies the StOC then 



OST recovers the support of a with high probability provided 
a m in is large enough. Subsequently, we establish in Lemma [2] 
and Lemma [3] the relationship between the StOC parameters 
and the worst-case and average coherence of $. The proofs of 
Theorem Q] and Theorem [2] then follow by judiciously com- 
bining the results of these three lemmas using the coherence 
property. 

Definition 2 ((k, e, <5)-StOC). Let IT = (in , . . . , 7r fc ) be a 
uniformly random (ordered) fc-subset of {1,...,C} and let 
IF = {1, . . . ,C} — II. Then, given e,6 e [0,1), $ is said 
to satisfy the (k, e, 5) -statistical orthogonality condition if the 
following inequalities 

(StOC-1) ||K$n-/HL<eNj 2 
(StOC-2) ||$^$nHL<e|Ml2 

hold for every fixed z £ C k with probability exceeding 1 — 5 
(with respect to the choice of IT). 

Remark 2. Note that the StOC derives its name from the fact 
that if $ is an orthogonal matrix then it trivially satisfies the 
StOC for every k with e = 6 = 0. 

Lemma 1. Let IT = supp{a) be a uniformly random 
k-subset of {1,...,C}. Further, suppose that the matrix 
$ satisfies the (fc, e, 8)-StOC and choose the threshold as 
A = 2 max {e, 2-^/cr 2 logC }. Then, under the assumption that 
Qimin > 2A, the OST satisfies 

Pr(sVlf) < 5 + 2 (y27r log C-C^j . 

Proof: We begin by defining z T = [a wi . . . Ohr k ] and 
writing the vector y = < I >H / as y = $ H <I>n2 + < I >H ??. Now, let 
IT C = {1, . . . , C} — IT and note that in order to establish that 
S = IT we need to show that 1 1 z/n= 1 1 oo < A and min|y ffi | > A. 

In this regard, first note that fj = $ H ?7 is a complex Gaussian 
random vector whose entries are identically (although not 
independently) distributed as CN(0, a 2 ). It therefore follows 
from the tail bound on the maximum of C arbitrary complex 
Gaussian random variables that ||?/j|oo < 2y 'a 2 logC with 
probability exceeding 1 — 2 (^/27r logC • C) . Further, define 

Q = {{Moo < 2^ 2 iogC} f|{(stoc-i) n (stoc-2)}} 

and notice that, since the noise is independent of IT, we have 
Pr(<7) > 1-5 — 2 (V27r logC -Cy l . In addition, conditioned 
on the event Q, we have 

(a) 

||yn<=||°o < ||$n<= $ nz||oo + ||^||oo 

(b) (c) 

< e + 2yja 2 logC < A (1) 

where (a) follows from the triangle inequality, (b) is a 
consequence of the conditioning on Q, and (c) follows from 
the fact that A = 2 max {e, 2^<r 2 logC } . 

Finally, in order to show that minly^ ! > A, we define 

i 

r = (f&n^n — I)z and note that, conditioned on the event Q, 



we have for any i S {1, . . . , k} 

\y Vi \ = 1^ + n + fj ni \ > \a^.\ - ||r||oo - \\fj\\ 
(d) 



(e) 



> 2A - e - 2yja 2 logC > A (2) 

where (d) follows from the conditioning on Q and the assump- 
tion that a m i n > 2A, while (e) is a simple consequence of the 
choice of A. This completes the proof of the lemma since we 
have now shown that Pr(5 ^ IT) < Pr (Q c ). ■ 
Having established Lemma Q] our next goal is to relate the 
StOC parameters with the worst-case and average coherence. 

Lemma 2. Let H = (jri, . . . , 7Tfc) be a uniformly random 
(ordered) k-subset of {1, . . . ,C}. Then, for any fixed z G C k , 
e > 0, and k < min{e 2 z;~ 2 /4, C/2}, we have 



Pr({$ does not satisfy(StOC-l)}) < 4fccxp - 



eV 2 
576 

Proof: The proof of this lemma relies heavily on the so 
called method of bounded differences (MOBD) ifTTl . Specifi 

cally, note that ||($n$n - ^>|L = max £ z j {Vm > <^r, ) 

and define IT 1 = (tti, . ■ ■ , 7rj_i,7Tj+i, . . . ,7i"fe). Then for a 
fixed index i, and conditioned on the event A41 = {iii = i'}, 
we have the following equality from basic probability theory 

k 



Pr ( \y^ J Zj(ip Vi ,ip irj )\ > e\\z\\ 2 A q > J 

^ .7 = 1 ' 



3 



Pr ( |£ z j(Vi'>^)| > £ IMl2 



Ai 



(3) 



Next, in order to apply the MOBD, we construct a Doob's 
martingale sequence (Mo, Mi, . . . , Mk-i) as follows: 
h 



M = eJ^Zj^'jVV,; 



, and 



7 



l,...,fc-l (4) 



where 71^ is the first £ coordinates of IT \ Here, note that 



M 1 < J2 M 



^E 



C 1 



9=1 



(6) r - 

< V*I/||z||2 



(5) 



where (a) follows since, conditioned on Ai>, itj has a uniform 
distribution over {1, . . . ,C} — {i 1 }, while (b) mainly follows 
from the definition of average coherence. Further, if we define 



M e (x) = E[ ^ Zj ((pi> , ip %j ) 



3 = 1 

3& 



w i-ye—n w t 



X, Ai' 



(6) 



then, since (Mo, Mi, . . . , Mk-i) is a Doob's martingale se- 
quence, it can be easily verified that \Mg — M(>_i| is upper- 
bounded by sup x y [Mg (x) — Mi{y)\ (see, e.g., H2)). 

Next, in order to upperbound sup^ [Me(x) — Mg{y)\ , we 



first define d 

E 



1,3 

tt7 



E 



(<Pi' , Vwj 

M e (x) - M t {y) 



7 i = y,A 



7T n \ £ _i,7r f 1 



- x, A', 
and then notice that 



< 



E 

j<£+l 
3^i 



\ Z i\\ d t,3\ 



E 

j> i+i 



(7) 



In addition, we have that for every j > £+l,j ^ i, the random 
variable irj has a uniform distribution over {1,...,C} — 
{ir^ e _ 1 , x,i'} when conditioned on {ni^t-v^i'' = x ii'}> 
while TTj has a uniform distribution over {1,...,C} — 
{tTx^^-d V,i'} when conditioned on {7r7"_! > .^_ 1 ,7r^"* = y, i'}. 
Therefore, we have for every j > £ + 1 , j ' ^ i that 



\*tj\ = 



1 



(Vi'><Pv) - (<Pi',<Px) 



< 



2[L 

C-k' 



(8) 



Similarly, it can be argued that J^i< l+i \ z j \ \de,j \ _■ |^+i|2/i 



when i < I, J2j<e+i 



V,3 



< 



\2(jl when i 



1. 



and X)j<f+i NjII^jI (\ z e\ + c-"fc )^ when i > £+1 



3+i 



Consequently, it can be easily verified that 

sup [M e {x) ~ M e (y)] < 2ft(\z t \ + \z i+1 



C-k 



(9) 



We have now established that (Mq, Mi, . . . , M^-i) is a 
bounded-difference martingale with \M( — M^_i| < Cf for 
£ = 1, . . . , k — 1. Further, it can also be verified from (O 
that J2eZl c I < 36/Lt 2 ||2||| since fc < C/2. In addition, 
since |Mo| < \/A;^||z||2 and k < e 2 i/ _2 /4, we have from 
the Azuma inequality for bounded-difference martingale se- 
quences fPJl adapted to the complex-valued setup that 



Pr ( I E^'^''^! > e ii z ii 2 

4 z h 



3=1 



A, 



< 



Pr M fe _!-M > 



A v < 4cxp 



2 —2 

576 ' 



(10) 



Combining all these facts together, we therefore finally obtain 

Prf|l($g$n-/)z|L >e\\zh 



(c) 



< fe ^Pr I lE^'^*'''^)! > e ll z H 2 



(d) 

< 4fcexp — 



i=i 

eV 2 
576 



A' Pr(A') 



(ID 



where (c) follows from the union bound and the fact that the 
7iVs are identically (although not independently) distributed, 



while (d) follows from ( fTOb and the fact that Tii has a uniform 
distribution over {1, ... ,C}. ■ 

Lemma 3. Let II = . . . , 7Tfc) be a uniformly random 
(ordered) k-subset of {1, ... ,C}. Further, define the random 
subset n c = {1, . . . ,C} — II. Then, for any fixed z £ C k , 
e > 0, and k < min{e 2 ^~ 2 /4, C/2}, we have 



Pr({$ does not satisfy (StOC-2)}) < 4Ccxp 



256 



Proof Sketch: The proof of this lemma also relies on 
the MOBD and is very similar to that of Lemma [2] As such, 
we only provide a sketch of the proof here. To begin with, 

, where 



we note that H^iL^n^ = max 

11 n " llo ° i£[C-k] 



j 

[C — k] = {1, . . . , C — k} and 7r? denotes the i th coordinate 
of II C . Then for a fixed index i G [C — k], and conditioned on 
the event Ai* = {irf = i'}, we have the following equality 



Pr ( I X^'v^*'^' 



>e \\z 2 



(12) 



Next, as in the case of Lemma [2] we construct a Doob's 
martingale sequence (Mq, Mi, . . . , Mfc) as follows: 



M = E 



Me = E 



^71 



and 



3=1 
fe 



l,...,ft (13) 



where 7Ti_^ now denotes the first I coordinates of II. It can 
now be argued (as in Lemma |2]i that: (i) Mo < vfc^||z||2, 

(ii) \M t - M t -y\ < 2fi(\z t \ + gi) = ci, and (iii) 

ELi c i < 16M 2 ||z||i- Therefore, since k < e 2 v 2 /4, we once 
again have from the (complex) Azuma inequality that ( fT2l is 

2 — 2 . 

upperbounded by 4cxp ( — £ 2 g 6 ). Combining all these facts 
together, we finally obtain the claimed result as follows 

Pr (Hag^L > 4A\^j < 4Cexp {-^A (14) 

where (a) mainly follows from the union bound and the fact 
that 7r| has a uniform distribution over {1, . . . , C}. ■ 
We are finally ready to prove the main results of this paper 
using Lemmata QJ[3] 

Proof of Theorem [7} Note that Lemma [2] and Lemma [3] 
imply that if $ has worst-case coherence fi and average 
coherence v then, as long as k < C/2, $ satisfies the (fc, e, 5)- 
StOC for any e e [2Vku, 1) with 6 = 8Cexp 

Now let ft < iV/(21ogC) and define e' = 24// y/2 logC. 
Then, since $ satisfies the coherence property, we have 
2\fkv < e' < 1 and therefore $ satisfies the (ft, e',8')- 
StOC with <5' = 8C _1 . Consequently, Lemma [TJ states that 



Pr(5 ^ supp(a)) < as lo ng as N > 2k log C, 

amin > 4e', and a m j„ > %\J o 2 logC. Further, note that 

64 



a min > 8 V 'a 2 logC 
>4e' 



On 



A > 



SNR m i n 

2c 2 



ft log C, and 

/9/2 



MAR 



ft log C 



This completes the proof of the theorem. ■ 
Proof of Theorem [2} The proof of this theorem follows 
along similar lines as that of Theorem [TJ and is therefore 
omitted here for the sake of brevity. ■ 

IV. Conclusion 

In this paper, we have analyzed the one-step thresholding 
(OST) algorithm for model selection in terms of the worst- 
case and average coherence of the design matrix. In stark 
contrast to the existing work on model selection using thresh- 
olding, our analysis is completely nonasymptotic in nature, 
it does not require knowledge of the true model order, and 
it is applicable to arbitrary (random or deterministic) design 
matrices. In particular, we have established in the paper that 
the OST can be used for model selection as long as the 
design matrix obeys an easily verifiable property. Further, we 
have specified the dependence of the OST performance on 
the worst-case coherence of the design matrix and shown that 
it performs near-op timally in the low SNR regime for design 
matrices with 0(A -1 / 2 ) worst-case coherence. Finally, unlike 
the assumptions made in J6), our analysis also does not require 
that most A x ft submatrices of <1> be well-conditioned and the 
nonzero entries of the data vector be statistically independent. 
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